Fast and Lightweight LCP-Array Construction Algorithms
نویسندگان
چکیده
The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the lengths of the longest common prefixes of lexicographically adjacent suffixes, and it can be computed in linear time. In this paper, we present new LCP-array construction algorithms that are fast and very space efficient. In practice, our algorithms outperform the currently best algorithms on large inputs.
منابع مشابه
Lightweight LCP-Array Construction in Linear Time
The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...
متن کاملCritique "Lightweight LCP Construction for Next-Generation Sequencing Datasets"
The paper presents the rst lightweight method that simultaneously computes, the longest common pre x array(LCP) and BWT of very large collections of sequences. Knowing the LCP of DNA sequences collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-e cient algorithms for computing the LCP of a string have been describ...
متن کاملPermuted Longest-Common-Prefix Array
The longest-common-prefix (LCP) array is an adjunct to the suffix array that allows many string processing problems to be solved in optimal time and space. Its construction is a bottleneck in practice, taking almost as long as suffix array construction. In this paper, we describe algorithms for constructing the permuted LCP (PLCP) array in which the values appear in position order rather than l...
متن کاملLightweight LCP construction for very large collections of strings
The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from “next-g...
متن کاملFaster External Memory LCP Array Construction
The suffix array, perhaps the most important data structure in modern string processing, needs to be augmented with the longest-common-prefix (LCP) array in many applications. Their construction is often a major bottleneck especially when the data is too big for internal memory. We describe two new algorithms for computing the LCP array from the suffix array in external memory. Experiments demo...
متن کامل